The genome sequence of wood avens, Geum urbanum L., 1753

We present a genome assembly from an individual Geum urbanum the (wood avens; Streptophyta; Magnoliopsida; Rosales; Rosaceae). The genome sequence is 1,304.9 megabases in span. Most of the assembly is scaffolded into 21 chromosomal pseudomolecules. The mitochondrial and plastid genomes have also been assembled and are 335.5 and 156.1 kilobases in length respectively. Gene annotation of this assembly on Ensembl identified 50,336 protein-coding genes.


Background
Geum urbanum L. (Rosaceae) is a widespread European perennial herb, the range of which extends to western Asia, western Siberia, and the northwest coast of Africa (Taylor, 1997).It is native to Britain and Ireland and occurs abundantly, except in some parts of northern Scotland and Ireland (Preston et al., 2002;Stace et al., 2019).Implied by its common name, wood avens, G. urbanum typically grows in woodland, shrubland, and hedgerows with well-drained conditions, but is also found in disturbed and more open habitats, waste grounds, gardens and parks (Ruhsam et al., 2011;Taylor, 1997).It is a predominantly self-pollinating species with the outcrossing rates ranging from 0.058 to 0.177 in natural populations (Ruhsam et al., 2010), yet its small, erect, yellow flowers can still attract pollinators (Figure 1).The achene fruits of G. urbanum have a single hook (Figure 1), which makes the seeds well-adapted to dispersal by animals (Chen et al., 2013;Gorb & Gorb, 2002;Smedmark & Eriksson, 2006).
Cytogenetic evidence shows that G. urbanum is an ancient hexaploid (2 n = 42) (Gajewski, 1957;Gajewski, 1958), with molecular studies suggesting that allopolyploidisation gave rise to this hexaploid lineage in Rosoideae (Gajewski, 1957;Smedmark et al., 2005;Smedmark et al., 2003).However, recent genetic studies show that this species largely behaves as a diploid, although with some additional duplicated gene copies (Jordan et al., 2018;Ruhsam, 2009).This species is known for its rampant hybridisation with a closely related species, G. rivale, where both occur in close proximity.These two species have several contrasting attributes, including mating system, flower morphology and habitat preference.Apart from its interesting biological features, many of the secondary metabolites of G. urbanum have important pharmacological uses (Al-Snafi, 2019).This genome will be extremely helpful for evolutionary studies aimed at understanding historical and contemporary hybridisation (Jordan et al., 2018;Ruhsam et al., 2011) and the genetic basis of the selfing syndrome (Sicard & Lenhard, 2011).It will also further contribute to uncovering the potential medical value of compounds produced by G. urbanum.

Genome sequence report
The genome was sequenced from a Geum urbanum specimen collected from a garden bed at the Royal Botanic Gardens, Kew (latitude 51.48, longitude -0.30).Using flow cytometry, the genome size (1C-value) was estimated to be 1.64 pg, equivalent to 1,610 Mb.A total of 27-fold coverage in Pacific Biosciences single-molecule HiFi long reads and 64-fold coverage in 10X Genomics read clouds were generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 5 missing joins or mis-joins and removed one haplotypic duplication, reducing the scaffold number by 13.33%.

Amendments from Version 1
The genome assembly presented here has been annotated by the European Bioinformatics Institute since the first version of this data note, and we have updated the article by linking to the annotation data and method.
Any further responses from the reviewers can be found at the end of the article The final assembly has a total length of 1,304.9Mb in 26 sequence scaffolds with a scaffold N50 of 65.2 Mb (Table 1).Most (99.95%) of the assembly sequence was assigned to 21 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 2-Figure 5; Table 2).

Genome annotation report
The Geum urbanum genome assembly (GCA_946800695.1)was annotated at the European Bioinformatics Institute (EBI) on Ensembl Rapid Release.The resulting annotation includes 75,552 transcribed mRNAs from 50,336 protein-coding and 10,365 non-coding genes (Table 2; https://rapid.ensembl.org/Geum_urbanum_GCA_946800695.1/Info/Index).The average transcript length is 2,623.32.There are 1.24 coding transcripts per gene and 5.16 exons per transcript.
Metadata for specimens, spectral estimates, sequencing runs, contaminants and pre-curation assembly statistics can be found at https://links.tol.sanger.ac.uk/species/57919.

Sample acquisition, genome size estimation and nucleic acid extraction
A specimen of Geum urbanum (drGeuUrba1) was collected from Bed 227 of the Rhododendron Dell at the Royal Botanic Gardens, Kew (latitude 51.48, longitude -0.30) on  26 August 2020.The specimen was picked by hand from weedy vegetation on the edge of the lawn by Maarten Christenhusz (Royal Botanic Gardens, Kew), collection number 9055.The specimen was identified based on its morphology by Maarten Christenhusz, and was preserved by freezing at -80°C.
Using flow cytometry, the genome size (1C-value) was estimated using the fluorochrome propidium iodide and following the 'one-step' method outlined in Pellicer et al. (2021).Specifically for this species, the General Purpose Buffer (GPB) supplemented with 3% PVP and 0.08% (v/v) beta-mercaptoethanol was used for isolation of nuclei (Loureiro et al., 2007), and the internal calibration standard was Petroselinum crispum 'Champion Moss Curled' with an assumed 1C-value of 2,200 Mb (Obermayer et al., 2002)., 2022).The mitochondrial and chloroplast genomes were assembled using MBG (Rautiainen & Marschall, 2021) from PacBio HiFi reads mapping to related genomes: a representative circular sequence was selected for each from the graph based on read coverage.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The Ensembl Genebuild annotation system (Aken et al., 2016) was used to generate annotation for the Geum urbanum assembly (GCA_946800695.1) in Ensembl Rapid Release at the EBI.Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt (UniProt Consortium, 2019).

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome RNA was extracted from leaf tissue of drGeuUrba1 in the Tree of Life Laboratory at the WSI using TRIzol, according to the manufacturer's instructions.RNA was then eluted in 50 μl RNAse-free water and its concentration assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit RNA Broad-Range (BR) Assay kit.Analysis of the integrity of the RNA was done using Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Sequencing
Pacific Biosciences HiFi circular consensus and 10X Genomics read cloud DNA sequencing libraries were

Software tool Version
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material reported.
2. Since the authors also assembled the mitochondrial and plastid genomes, these genome sequences should be deposited in a publicly available database.How many chloroplast and mitochondrial genes could be identified?
3. The scripts used for the analyses are valuable to the research community and should be publicly available.
4. In the abstract, "Most of the assembly" can be replaced with "99.95% of the assembly" for better readability.
Is the rationale for creating the dataset(s) clearly described?Yes Results should be described in detail because there is no information on how many genes, transcription factors, and markers have been identified from the genome of Geum urbanum.Why did the authors not submit the genome sequences at NCBI and phytozome databases?How many differentially expressed genes and proteins are identified from this study?Hence, I advise the authors to please refer to already published articles and improve the contents of the manuscript.Reviewer Expertise: Functional Genomics and Genome Editing I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Is
Author Response 10 Jul 2024

Tree of Life Team Sanger
This data note reports on the whole genome assembly and its quality.This serves as a resource for researchers.It is not a research paper reporting on differentially expressed genes, transcription factors, and markers, although the resource is freely available to researchers who are studying the details of this species.The genome assembly is available on the NCBI (GCA_946800695.1)as well as on ENA.We deposit the data on ENA, which is part of the INSDC, along with NCBI and the DNA DataBank of Japan (DDBJ).The INSDC's main policy is to provide permanent, free, and unrestricted access to all archived nucleotide data.The three organisations exchange data daily.Phytozome prioritises species sequenced at the Joint Genome Institute and selected species sequenced elsewhere and includes functional annotation.Since this genome assembly can be highly useful for researchers conducting functional annotation, we opted to deposit our data in the widely accessible INSDC databases.We believe that the current data note provides valuable information on the whole genome assembly and its quality, offering a crucial resource for further research.Given that our primary aim is to present the genome assembly rather than conduct a detailed functional annotation, we have ensured that the data is available through established and widely accessible platforms like NCBI and ENA.This approach aligns with our objective to make the genome data freely available to the research community.Since the first version of this data note, this Geum urbanum assembly has been annotated at the European Bioinformatics Institute (https://rapid.ensembl.org/Geum_urbanum_GCA_946800695.1/Info/Index), and we have updated the genome note in the second version to report on this annotation.
Competing Interests: No competing interests were disclosed.Recommendations: LAI should be used to evaluate genome integrity.1.
Complete the genome annotation and genome feature analysis as soon as possible.2.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Identification and evaluation of plant germplasm resources I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Geum urbanum (not the sampled specimen) growing in secondary woodland.(a) The plant habit with five-petal flowers in late May.(b) Two small insects visiting the flower of G. urbanum.(c) A fruiting head of G. urbanum.Photos taken by Meng Lu.

Figure 2 .
Figure 2. Genome assembly of Geum urbanum, drGeuUrba1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,304,870,458 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (94,240,583 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths(65,224,196 and 44,983,829 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the eudicots_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/drGeuUrba1.1/dataset/CAMPEP01/snail.

Figure 3 .
Figure 3. Genome assembly of Geum urbanum, drGeuUrba1.1:GC coverage.BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/drGeuUrba1.1/dataset/CAMPEP01/blob.
DNA was extracted at the Tree of Life laboratory, Wellcome Sanger Institute (WSI).The drGeuUrba1 sample was weighed and dissected on dry ice with tissue set aside for Hi-C sequencing.Leaf tissue was cryogenically disrupted to a fine powder using a Covaris cryoPREP Automated Dry Pulveriser, receiving multiple impacts.High molecular weight (HMW) DNA was extracted using the Illustra Nucleon PhytoPure HMW DNA extraction kit.HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30.Sheared DNA was purified by solid-phase reversible immobilisation using AMPure PB beads with a 1.8× ratio of beads to sample to remove the shorter fragments and concentrate the DNA sample.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA

Figure 4 .
Figure 4. Genome assembly of Geum urbanum, drGeuUrba1.1:cumulative sequence.BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/drGeuUrba1.1/dataset/ CAMPEP01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Geum urbanum, drGeuUrba1.1:Hi-C contact map.Hi-C contact map of the drGeuUrba1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=JSrsAc2aSfi-e9L51ilEkA.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Partly Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Genome assembly, Comparative Genomics, Evolutionary Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.doi.org/10.21956/wellcomeopenres.21781.r86519© 2024 Maharajan T. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Theivanayagam Maharajan Division of Plant Molecular Biology and Biotechnology, Department of Biosciences, Rajagiri College of Social Sciences, Cochin, Kerala, India the rationale for creating the dataset(s) clearly described?Partly Are the protocols appropriate and is the work technically sound?Partly Are sufficient details of methods and materials provided to allow replication by others?No Are the datasets clearly presented in a useable and accessible format?No Competing Interests: No competing interests were disclosed.

Reviewer Report 09
May 2024 https://doi.org/10.21956/wellcomeopenres.21781.r78942© 2024 Wu Z.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Zinian WuInstitute of Grassland Research, Chinese Academy of Agricultural Sciences, Hohhot, ChinaIn this study, the high-quality genome of Geum urbanum was assembled.The Scaffold N50 was 65.2 MB, reaching the chromosome level.The estimated Quality Value ( QV ) of the final assembly is 59.6 with k-mer completeness of 99.99 %. and the assembly has a BUSCO v5.3.2 completeness of 98.7 %.It's a pretty perfect job.